Online Meta-learning by Parallel Algorithm Competition

机译：并行算法竞争在线元学习

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The efficiency of reinforcement learning algorithms depends critically on afew meta-parameters that modulates the learning updates and the trade-offbetween exploration and exploitation. The adaptation of the meta-parameters isan open question in reinforcement learning, which arguably has become more ofan issue recently with the success of deep reinforcement learning inhigh-dimensional state spaces. The long learning times in domains such as Atari2600 video games makes it not feasible to perform comprehensive searches ofappropriate meta-parameter values. We propose the Online Meta-learning byParallel Algorithm Competition (OMPAC) method. In the OMPAC method, severalinstances of a reinforcement learning algorithm are run in parallel with smalldifferences in the initial values of the meta-parameters. After a fixed numberof episodes, the instances are selected based on their performance in the taskat hand. Before continuing the learning, Gaussian noise is added to themeta-parameters with a predefined probability. We validate the OMPAC method byimproving the state-of-the-art results in stochastic SZ-Tetris and in standardTetris with a smaller, 10$\times$10, board, by 31% and 84%, respectively, andby improving the results for deep Sarsa($\lambda$) agents in three Atari 2600games by 62% or more. The experiments also show the ability of the OMPAC methodto adapt the meta-parameters according to the learning progress in differenttasks.

机译：强化学习算法的效率主要取决于少量的元参数，这些元参数会调制学习更新以及探索与开发之间的取舍。元参数的适应是强化学习中的一个悬而未决的问题，随着高维状态空间中深度强化学习的成功，近来，这无疑已成为一个更大的问题。 Atari2600电子游戏等领域的学习时间较长，因此无法对合适的元参数值进行全面搜索。我们提出了基于并行算法竞赛的在线元学习（OMPAC）方法。在OMPAC方法中，强化学习算法的多个实例与元参数的初始值的微小差异并行运行。在固定次数的情节之后，根据实例在任务中的表现来选择实例。在继续学习之前，将高斯噪声以预定的概率添加到主题参数中。通过将随机SZ-Tetris和标准Tetris的较小结果（分别为10 $ \ x $ 10）的开发板改进31％和84％，并通过改进深层分析的结果，我们验证了OMPAC方法Sarsa（$ \ lambda $）代理商在三场Atari 2600游戏中增长了62％或更多。实验还证明了OMPAC方法能够根据不同任务的学习进度来适应元参数。

著录项

作者
Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Identification Of Fuzzy Relation Models Using Hierarchical Fair Competition-based Parallel Genetic Algorithms And Information Granulation [J] . Jeoung-Nae Choi, Sung-Kwun Oh, Witold Pedrycz Applied Mathematical Modelling . 2009,第6期

机译：基于分层公平竞争的并行遗传算法和信息粒度的模糊关系模型识别
2. Structural And Parametric Design Of Fuzzy Inference Systems Using Hierarchical Fair Competition-based Parallel Genetic Algorithms And Information Granulation [J] . Jeoung-Nae Choi, Sung-Kwun Oh, Witold Pedrycz International Journal of Approximate Reasoning . 2008,第3期

机译：基于分层公平竞争的并行遗传算法和信息粒度的模糊推理系统结构与参数设计
3. Stable Parallel Algorithms for Interdisciplinary Computer-Based Online Education with Real Problem Scenarios for STEM Education [J] . Liangfu Jiang, Haoran Yuan Complexity . 2021,第a期

机译：基于跨学科计算机的在线教育的稳定并行算法，具有真实问题的茎干教育
4. Fast On-Device Adaptation for Spiking Neural Networks Via Online-Within-Online Meta-Learning [C] . Bleema Rosenfeld, Bipin Rajendran, Osvaldo Simeone IEEE Data Science and Learning Workshop . 2021

机译：通过在线在线元学习的快速设备适应尖峰神经网络
5. A rough sets-assisted meta-learning method to select learning algorithms [D] . Lei, Minxiao 2008

机译：粗糙集辅助元学习方法选择学习算法
6. A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU [O] . Gang Mei, Jiayin Zhang, Nengxiong Xu, 2018

机译：在GPU上并行化分而治之算法的示例实现
7. Structural and parametric design of fuzzy inference systems using hierarchical fair competition-based parallel genetic algorithms and information granulation [O] . Choi Jeoung-Nae, Oh Sung-Kwun, Pedrycz Witold 2008

机译：基于分层公平竞争的并行遗传算法和信息粒度的模糊推理系统结构与参数设计

Online Meta-learning by Parallel Algorithm Competition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅